perm filename CIRCUM[W78,JMC]3 blob sn#423302 filedate 1979-03-06 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.require "memo.pub[let,jmc]" source
C00003 00003	.bb Minimal entailment
C00027 00004	.bb Minimal models and minimal entailment in propositional calculus.
C00033 00005	.bb Notes
C00037 00006	.skip 1
C00038 ENDMK
C⊗;
.require "memo.pub[let,jmc]" source;
.every heading (,draft,{page})
.font B "zero30"
.at "Goedel" ⊂"G%B:%*odel"⊃
.turn on "_" for "#"
.turn on "α"
.turn on "↑"

.GROUP SKIP 20
.cb CIRCUMSCRIPTION INDUCTION - A WAY OF JUMPING TO CONCLUSIONS


Abstract: Humans and intelligent computer programs must often jump to
the conclusion that the objects they know about are the only
objects involved in a problem.  %2Circumscription induction%1 is
one formalization of this kind of approximate reasoning.

.skip to column 1
.bb Minimal entailment

	%2Minimal reasoning%1 has a semantic form called %2minimal
entailment%1 and a syntactic form called %2minimal inference%1.  We begin
with %2minimal entailment%1.

	Let ⊗A be a set of sentences of first order logic and ⊗p a
sentence.  As is customary in logic, we say that %2A entails p%1 (written
%2A %8q%2 p%1) iff ⊗p is true in all models of ⊗A.  Suppose ⊗M1 and ⊗M2 
are two models of ⊗A.  ⊗M1 is called a submodel of ⊗M2 (written %2M1 ≤
M2%1) iff %2domain(M1) ⊂ domain(M2)%1, and the predicates when applied to
the elements of %2domain(M1)%1 have the same values in ⊗M2 that they have
in ⊗M1.  ⊗M is called a minimal model of ⊗A iff it has no proper
submodels.  We say that ⊗A %2minimally entails p%1 (written %2A %8q%4m%2
p%1) iff ⊗p is true in all minimal models of ⊗A.

	Minimal entailment is not equivalent to deduction or
ordinary entailment in any logical system.
For any form of deduction, if %2A %8r%2 p%1
and %2A ⊂ B%1, then %2B %8r%2 p%1.  We may call this property
%2monotonicity%1.  Exactly the same proof that
proved ⊗p from ⊗A will
prove it from ⊗B.  Entailment is also monotonic;
if ⊗p is true in all models of ⊗A, it is true in all
models of ⊗B, because the latter are a subset of the models of ⊗A. 
However, minimal entailment is not monotonic.  Namely,
consider the null set of sentences.  Its minimal model is a domain
with one element (in most formalizations) so that the sentence
%2∀x_y.x_=_y%1 is minimally entailed by the null set.  However, the set
⊗B consisting of the negation of this sentence certainly
contains the null set.
Note also that minimal entailment is not transitive; we can have
%2p %8q%2 q%1 and %2p %8q%2 q%1 without having %2p %8q%2 q%1.

	Here are some mathematical examples.  The minimal model of the
Peano axioms without the schema of induction is the standard model of the
integers provided the constant 0 is explicitly mentioned.  The system
consisting only of the axioms for the %2successor%1 operation omitting
axioms for 0 has no minimal model, because any model has a submodel.
There is also no minimal model if 0 is characterized only by an
existential axiom, %2∃z.∀x.successor_x_≠_z%1.  The minimal model of the
axioms for group theory is the group of one element.  If we add a sentence
asserting that there is more than one element, then the minimal models are
the cyclic groups of prime order and the infinite cyclic group.
.skip 1

.bb Minimal inference.

	Minimal entailment is a semantic mode of reasoning
and is in general infinitary.  Indeed, since the minimal
model of the usual first order axioms for the integers is the %2standard
model%1, the minimal consequences of the axioms are just the true
sentences of arithmetic, and this set isn't recursively enumerable or even
arithmetic.  There is, however, a corresponding finitary syntactic mode
of reasoning.  Since by Goedel's theorem it won't be complete, there will
be stronger modes of reasoning, and the correspondence with minimal
entailment won't be exact.

	The idea is to imitate the axiom schema of induction of Peano arithmetic.
Let ⊗A be the conjunction of a certain subset of our axioms called the
primary axioms.  The criterion for calling some axioms primary and others
secondary is not yet entirely clear, but some
separation seems desirable.  Let %2A%A↑F%1 be the result of relativizing
⊗A with respect to the predicate parameter %AF%1.  (%2A%A↑F%1 is obtained
from ⊗A by replacing every universal quantifier %2∀x%1 by
%2∀x.%AF%2(x)_⊃%1 and every existential quantifier %2∃x%1 by
%2∃x.%AF%2(x)_∧%1.  For each constant ⊗a appearing in ⊗A, we must adjoin
%AF%2(a)%1 and for each function symbol ⊗f, we must adjoin the axiom
%2∀x.(%AF%2(x)_⊃_%AF%2(f(x))%1)).  We introduce the schema

	%2A%A↑F%2 ⊃ ∀x.%AF%2(x)%1

which we call %2the minimal completion schema%1 of ⊗A and write %2MCS(A)%1.
We will call the system obtained by adjoining  the schema %2MCS(A)%1 
to the axiom ⊗A the %2minimal completion of A%1, writing it %2MC(A)%1.
%2MCS(A)%1 is an attempt to assert that the only objects are those
required to exist by the axioms ⊗A.

	Here is a trivial example of a minimal completion.  Suppose we
have a single axiom %2ok(a)%1 in our primary set.  Its relativization will
be %AF%2(a)_∧_ok(a)%1.  The minimal completion schema is then

	%AF%2(a) ∧ ok(a) ⊃ ∀x.%AF%2(x)%1.

If we substitute %2λx.(x_=_a)%1 for %AF%1 in the schema we can
conclude %2∀x.(x_=_a)%1 which characterizes the minimal model
of that axiom.  Indeed, Martin Davis showed me that if every model of ⊗A 
has a minimal submodel and a sentence %Af%1 is true in any supermodel
of any model of %Af%1, then %Af%1 is true in all minimal models of ⊗A iff it is
provable in the minimal completion of ⊗A. 

	Let ⊗s1 be the sentence %2∀x.ok(x)_=_p%1.
⊗p is provable in %2MC(α{ok(a)}_∪_s1%1, but it is not provable
in %2MC(α{ok(a)_∧_s1}%1, because the later admits minimal models
in which ⊗p is false and there is an individual ⊗b such that
⊗¬ok(b).  Therefore, it seems that we have to divide our sentences
into two groups if we want minimal inference to be useful - a primary
group whose minimal completion we form, and a secondary group which
is used but which is not completed.  The criteria for deciding in
which group to put a fact are determined by how we want Occam's
razor to work.  We hope to clarify this in a later version of this
paper.

	We can form the minimal completion of the algebraic Peano
axioms as follows:  We write the axioms

!!a1:	%2∀n.n' ≠ 0%1,

!!a2:	%2∀n.(n')%5-%2 = n%1,

and

!!a3:	%2∀n.(n ≠ 0 ⊃ n%5-%2' = n)%1,

where %2n'%1 denotes the successor of %2n%1 and %2n%5-%1 denotes
the predecessor of %2n%1.
The minimization schema of this set of axioms is

!!a4:	%2%AF%2(0) ∧ ∀n.(%AF%2(n) ⊃ %AF%2(n')) ∧ ∀n.(%AF%2(n) ⊃ %AF%2(n%5-%2)) ∧
∀n.(%AF%2(n) ⊃ n' ≠ 0) ∧ ∀n.(%AF%2(n) ⊃ (n')%5-%2 = n) ∧
∀n.(%AF%2(n) ⊃ (n≠0 ⊃ (n%5-%2)' = n))) ⊃ ∀n.%AF%2(n)%1.

	This is much longer than the usual schema of induction,
but can be shown equivalent to it.  Notice that the conjuncts in
the premiss that relativize the axioms ({eq a1})-({eq a3}) are
redundant; because since they contain only universal quantifiers,
they are weaker than the axioms they relativize.
Thus Peano arithmetic is the minimal completion of its algebraic
subsystem.  As
usual, when the axioms admit only infinite models, the
schema will not be sufficient to enforce the minimal model, and
there will also be %2non-standard%1 models.
Martin Davis has also shown that if 0 is introduced through an existential
axiom rather than as a constant, the minimal completion is inconsistent;
this corresponds to the fact that this system has no minimal
model.

.skip 1
.bb Applications
%1
	We envisage using minimal inference in AI as follows:
Suppose a system has a certain collection of facts.  It may or
may not have all the relevant facts, but it is reasonable for
it to conjecture that does and examine the consequences of the
conjecture.  The minimal completion axiom for a set of statements
is an expression of the hypothesis that the only objects that
exist are those which must exist on the basis of these statements.
In order to make this workable in practice, it will be necessary
to apply minimal completion to subsets of the facts; different
subsets will give rise to different conjectures.  Moreover, we
will need to apply the relativization to only some of the sorts
of individuals, so that we can remain non-committal about whether
there are more of certain sorts than our axioms prove to exist.

	An example that has been much
studied in AI and which contains enough detail to illustrate
various issues is the %2missionaries and cannibals%1 problem, stated
as follows:

	%2"Three missionaries and three cannibals come to a river.  A
rowboat that seats two is available.  However, if the cannibals ever
outnumber the missionaries on either bank of the river, the outnumbered
missionaries will be eaten.  How shall they cross the river?"%1.

	Amarel (1971) considered various representations of the problem
and discussed criteria whereby the following representation is preferred
for purposes of AI, because it leads to the smallest state space that must
be explored to find the solution.  We represent a state by the triplet
consisting of the numbers of missionaries, cannibals and boats on the
initial bank of the river. The initial configuration is 331, the desired
final configuration is 000, and one solution is given
by the sequence (331,220,321,300,311,110,221,020,031,010,021,000)
of states.

	We are not presently concerned with the heuristics of the problem
but rather with the correctness of the reasoning that goes from the
English statement of the problem to Amarel's state space representation.
A generally intelligent computer program should be able to
carry out this reasoning.  Of
course there are the well known difficulties in making computers
understand English, but suppose the English sentences describing the
problem have already been rather directly translated into first order
logic.  The correctness of Amarel's representation is not an ordinary
logical consequence of these sentences for two further reasons.

	First, nothing has been stated about the properties of boats or
even the fact that rowing across the river doesn't change the numbers of
missionaries or cannibals or the capacity of the boat.  These are a
consequence of common sense knowledge, so let us imagine that common sense
knowledge, or at least the relevant part of it, is also expressed in first
order logic.

	The second reason we can't ⊗deduce the propriety of Amarel's
representation is deeper.  Imagine giving someone the problem, and after
he puzzles for a while, he suggests going upstream half a mile and
crossing on a bridge.  "What bridge", you say.  "No bridge is mentioned in
the statement of the problem."  And this dunce replies, "Well, it isn't
stated that there isn't a bridge".  You look at the English and even at
the translation of the English into first order logic, and you must admit
that it isn't stated that there is no bridge.  So you modify the problem
to exclude bridges
and pose it again, and the dunce proposes a helicopter, and after you
exclude that he proposes a winged horse or that the others hang onto the
outside of the boat while two row.  You now see that while a dunce, he is
an inventive dunce, and you despair of getting him to accept the problem
in the proper puzzler's spirit and tell him the solution.  You are further
annoyed when he attacks your solution on the grounds that the boat might
have a leak or
not have oars.  After you rectify that omission, he suggests that a sea
monster may swim up the river and may swallow the boat.  Again you are
frustrated, and you look for a mode of reasoning that will settle his hash
once and for all.

	Minimal models and minimal inference may give the solution.  In a
minimal model of the first order logic statement of the problem, there is
no bridge or helicopter.  "Ah", you say, "but there aren't any oars
either".  No, we get out of that as follows:  It is a part of common
knowledge that a boat can be used to cross a river %2unless there is
something wrong with it or something else prevents using it%1, and in the
minimal model of the facts there doesn't exist something wrong with the
boat, and there isn't anything else to prevent its use.

	The non-monotonic character of minimal reasoning is just what we
want here.  The statement, %2"There is a bridge a mile upstream."%1
doesn't contradict the text of the problem, but its addition invalidates
the Amarel representation.

	In the usual sort of puzzle, there can be no additional source
of information beyond the puzzle and common sense knowledge.  Therefore,
the convention can be explicated as taking the minimal completion
of the puzzle statement and a certain part of common sense knowledge.
If one really were sitting by a river bank and these six people came
by and posed their problem, one wouldn't immediately take the minimal
completion for granted, but one would consider it as a hypothesis.

	At first I considered this solution too ad hoc to be
acceptable, but I am now convinced that it is a correct explication of the
%2normal%1 situation, e.g. unless something abnormal exists, an object can
be used to carry out its normal function.

	Some have suggested that the difficulties might be avoided by
introducing probabilities.  They suggest that the existence of a bridge is
improbable.  Well the whole situation involving cannibals with the
postulated properties is so improbable that it is hard to take seriously
the conditional probability of a bridge given the hypotheses.  Moreover,
we mentally propose to ourselves the normal non-bridge non-sea monster
interpretation ⊗before considering these extraneous possibilities, let
alone their probabilities, i.e. we usually don't even introduce the sample
space in which these possibilities are assigned whatever probabilities one
might consider them to have.  Therefore, regardless of our knowledge of
probabilities, we need a way of formulating the normal situation from the
statement of the facts, and the minimal completion axioms seems to be a method.

	Of course, we haven't given an expression of common sense knowledge
in a form that says a boat can be used to cross rivers unless there is
something wrong with it.  In particular, we haven't introduced into
our ⊗ontology (the things that exist) a category that includes
%2something wrong with a boat%1 or a category that includes
%2something that may prevent its use%1.
Incidentally, once we have decided to admit %2something wrong with the boat%1,
we are inclined to admit a %2lack of oars%1 as such a something and to ask
questions like, %2"Is a lack of oars all that is wrong with the boat?%1".

	I imagine that many philosophers
and scientists would be reluctant to introduce such ⊗things, but the
fact that ordinary language allows %2"something wrong with the boat"%1
suggests that we shouldn't be hasty in excluding it.  Making a suitable
formalism may be technically difficult and philosophically problematical,
but we must try.

	In compensation, we may get an increased ability to understand
natural language, because if natural language admits something like
minimal reasoning, it is understandable that we will have difficulty
in translating it into logical formalisms that don't.

	The possibility of forming minimal completions of various
parts of our knowledge introduces a kind of fuzziness that I believe
may prove more fruitful than that explicated by fuzzy set theory.
.bb Minimal models and minimal entailment in propositional calculus.

	As the above indicates, developing techniques for minimal entailment
and minimal inference
and axiomatizing common sense knowledge in a form suitable
for computer use will be difficult.
Therefore it seems worthwhile to study a propositional calculus
analog of the first order logic situation even though
we haven't found any
direct applications of the propositional case.

	Suppose
we have some facts to express but have not yet chosen a propositional
language.  Our first step is to introduce
propositional letters %2p%41%2,...,p%4n%1 and to specify each of them as
expressing some elementary fact.  We call this establishing a
%2propositional co-ordinate system%1.  Then we express the set ⊗A of known
facts as a sentence ⊗AX in this co-ordinate system.  A model of the facts
is any assignment of truth values to the %2p%4i%1's that makes ⊗AX true.
Let ⊗M1 and ⊗M2 be models of ⊗AX. We define

	%2M1 ≤ M2 ≡  ∀i.(true(p%4i%2,M2) ⊃ true(p%4i%2,M1))%1.

A minimal model is one that is
not a proper extension, i.e. a model in which (roughly speaking) as many
of the %2p%4i%1s as possible are false - consistent with the truth
of %2AX%1.  %2Note that while the models of a set of facts are
independent of the co-ordinate system, the minimal models depend on the
co-ordinate system%1.  Here is the most trivial example.
Let there be one letter ⊗p representing the one fact in a
co-ordinate system ⊗C, and let ⊗A be the null set of facts so that ⊗AX is
the sentence ⊗T (identically true).  The one other co-ordinate system ⊗C' 
uses the elementary sentence %2p'%1 taken equivalent to %2¬p%1.
The minimal model of ⊗C is
α{¬pα}, and the minimal model of ⊗C' is α{pα}.
Since minimal reasoning depends on the co-ordinate system and not merely
on the facts expressed in the axioms, the utility of an axiom system
will depend on whether its minimal models express the uses we wish to
make of Occam's razor.

	A less trivial example of minimal entailment is the following.
The propositional co-ordinate system has basic sentences %2p%41%1, %2p%42%1
and %2p%43%1, and %2AX = p%41%2 ∨ p%42%1.  There are two minimal models.
%2p%43%1 is false in
both, and %2p%41%1 is false in one, and %2p%42%1 is false in the other.
Thus we have %2AX_%8q%4m%2_(¬(p%41%2_∧_p%42%2)_∧_¬p%43%1).

	Note that if ⊗AX is a conjunction of certain elementary sentences
(some of them negated), e.g. %2AX_=_p%42%2_∧_¬p%44%1, then there is a unique
minimal model, but if, as above, %2AX_=_p%42%2_∨_¬p%44%1, then there
is more than one minimal model.

	One may also consider extensions %2relative%1 to a set ⊗B of
elementary sentences.  We have

	%2M1 ≤ M2 (mod B) ≡ ∀p.(p ε B ∧ true(p,M1) ⊃ true(p,M2)%1.

We can then introduce minimal models relative to ⊗B and the corresponding
relative minimal entailment.  Thus we may care about minimality and
Occam's razor only over a certain part of our knowledge.

	Note that we may discover a set of facts one at a time so that we
have an increasing sequence %2A%41%2 ⊂ A%42%1 ⊂ ... of sets of sentences.
A given sentence ⊗p may oscillate in its minimal entailment by the %2A%4i%1,
e.g. we may have %2A%41%8_q%4m%2_p%1, %2A%42%8_q%4m%2_¬p%1, neither may be
entailed by %2A%43%1, etc.  The common sense Occam's razor conclusion
about a sentence ⊗p often behaves this way as our knowledge increases.


.skip to column 1
.bb Notes
.item←0

	#. There is a relation between minimal entailment and the
THNOT formalism of micro-planner.  (THNOT ⊗p) means that an attempt
to prove ⊗p has failed, and such a failure to prove something
wrong with the boat could lead to a plan to row across the river.
Likewise failure to prove the existence of a bridge can be used
to go from the facts to the Amarel representation.
A major difference is that minimal entailment does not depend on
what theorem proving methods are used.

	#. When we add common sense knowledge or general scientific
knowledge to the facts about a particular problem, we get a huge
collection of facts with huge models most of which is irrelevant
to the problem.  The usual approach is for the human to separate
what is relevant and make an axiom system containing only relevant
information.  If we want a computer program to express what it
knows in a certain language, we need a better way.  One way of reducing
the number of models is to say that two models are equivalent if they
are the same for certain sorts and certain predicates and functions.
We then work with the equivalence classes and try to make a set of
sentences involving only the objects in these sorts and as few others
as possible whose models correspond to these equivalence classes.

	#. In the first order logic theory we also need to consider
extensions and minimality relative to a subcollection of the
sorts.

	#. In the propositional calculus system we minimize the set of
propositional co-ordinates assigned %3true%1 while in the predicate
calculus system we minimize the set of objects asserted to exist.  It
might be worth considering co-ordinate systems in the predicate calculus
system within which it is worthwhile to minimize the set of propositions
asserted.

	#. We can use circumscription to explain how the chemical
state of a person's blood stream can affect the conclusions he makes.
Of course, the state of fatigue etc. may be an explicit datum in
some decisions, such as whether to go to bed, but fatigue also affects
conclusions about other matters.  With circumscription, we can have
two non-contradictory sentences but where one of them will lead to
the conclusion ⊗p and the other will lead to ⊗¬p.  Namely, each
sentence has an "if nothing prevents it" condition, which
circumscription satisfies if the sentence is by itself.  Hmm, it isn't
clear that applying circumscription doesn't lead to a contradiction
unless the two sentences supply preventatives for each other.
.skip 1
.begin verbatim
John McCarthy
Artificial Intelligence Laboratory
Stanford University
Stanford, California 94305
.end

%7This draft of CIRCUM[W78,JMC] PUBbed at {time} on {date}.